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Abstract 



Remote sensing is a technology to acquire data for disatant sub- 
stances, necessary to construct a model knowledge for applications as 
classification. Recently Hyperspectral Images (HSI) becomes a high 
technical tool that the main goal is to classify the point of a region. 
£NJ The HIS is more than a hundred bidirectional measures, called bands 

(or simply images), of the same region called Ground Truth Map (GT). 
t> But some bands are not relevant because they are affected by different 

^ atmospheric effects; others contain redundant information; and high di- 

mensionality of HSI features make the accuracy of classification lower. 
All these bands can be important for some applications; but for the 
classification a small subset of these is relevant. The problematic re- 
lated to HSI is the dimensionality reduction. Many studies use mutual 
information (MI) to select the relevant bands. Others studies use the 
MI normalized forms, like Symmetric Uncertainty, in medical imagery 
applications. In this paper we introduce an algorithm based also on MI 
to select relevant bands and it apply the Symmetric Uncertainty coeffi- 
cient to control redundancy and increase the accuracy of classification. 
This algorithm is feature selection tool and a Filter strategy. We estab- 
lish this study on HSI AVIRIS 92AV3C. This is an effectiveness, and 
fast scheme to control redundancy. 



Keywords: Hyperspectral images, Classification, Feature Selection, Mu- 
tual information, Redundancy. 
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1 Introduction 

In the feature classification domain, the choice of data affects widely the re- 
sults. The problem of feature selection is commonly reencountered when we 
have N features (or attributes) that express N vectors of measures for C sub- 
stances (called classes). The problematic is to find K vectors among N, such 
as relevant and no redundant ones; in order to classify substances. The num- 
ber of selected vectors K must be lower than N because when N is so large 
that needs many cases to detect the relationship between the vectors and the 
classes (Hughes phenomenon) [10] . No redundant features (vectors) because 
they complicate the learning system and product incorrect prediction [14]. Rel- 
evant vectors means there ability to predicate the classes. The Hyperspectral 
image (HIS), as a set of more than a hundred bidirectional measures (called 
bands), of the same region (called ground truth map: GT), needs reduction 
dimensionality. Indeed the bands dont all contain the information; some bands 
are irrelevant like those affected by various atmospheric effects, see Figure. 3, 
and decrease the classification accuracy. Finaly there exist redundant bands, 
must be avoided. We can reduce the dimensionality of hyperspectral images by 
selecting only the relevant bands (feature selection or subset selection method- 
ology), or extracting, from the original bands, new bands containing the max- 
imal information about the classes, using any functions, logical or numerical 
(feature extraction methodology) [8] [9] [11] . Here we introduce an algorithm 
based on mutual information, reducing dimensionality in too steps: pick up 
the relevant bands first, and avoiding redundancy second. We illustrate the 
principea of this algorithm using synthetic bands for the scene of HIS AVIRIS 
92AV3C [1] , Figure. 1. Then we approve its effectiveness with applying it to 
real datat of HSI AVIRIS 92AV3C. So each pixel is shown as a vector of 220 
components. Figure. 2. shows the vector pixels notion [7 ]. So reducing di- 
mensionality means selecting only the dimensions caring a lot of information 
regarding the classes. 

The Hyperspectral image AVIRIS 92AV3C (Airborne Visible Infrared Imag- 
ing Spectrometer) [2] contains 220 images taken on the region "Indiana Pine" 
at "north-western Indiana", USA [1]. The 220 called bands are taken between 
0.4 (xm and 2.5 \mv. Each band has 145 lines and 145 columns. The ground 
truth map is also provided, but only 10366 pixels (49%) are labeled fro 1 to 16. 
Each label indicates one from 16 classes. The zeros indicate pixels how are not 
classified yet, Figure. 1. The hyperspectral image AVIRIS 92AV3C contains 
numbers between 955 and 9406. Each pixel of the ground truth map has a 
set of 220 numbers (measures) along the hyperspectral image. Those numbers 
(measures) represent the reflectance of the pixel in each band. So the pixel is 
shown as vector off 220 components. Figure .2. 

We can also note that not all classes are carrier of information. In Figure. 5, 
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Figure 1: The Ground Truth map of AVIRIS 92AV3C and the 16 classes 

for example, we can show the effects of atmospheric effects on bands: 155, 
220 and other bands. This Hyperspectral Image presents the problematic of 
dimensionality reduction. 

Figure. 2 shows the vector pixels notion [7]. So reducing dimensionality 
means selecting only the dimensions caring a lot of information regarding the 
classes. 




Images space Vector space 

(2 domensions from N) 



Figure 2: The notion of pixel vector 



2 Mutual Information Based Feature Selec- 
tion 

2.1 Definition of Mutual Information 

This is a measure of exchanged information between tow ensembles of random 
variables A and B: 

I(A,B)=J2p(A,B)log 2 nUD) 



p(A).p(B) 



Considering the ground truth map, and bands as ensembles of random 
variables, we calculate their interdependence. Fano [14] has demonstrated 
that as soon as mutual information of already selected feature has high value, 
the error probability of classification is decreasing, according to the formula 
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bellow: 



with 



and : 



H(C/X) - 1 ^ p ^ H(C/X) 
Log 2 (N c ) ~ e ~ Log 2 

H{C/X) - 1 _ H{C) - J(C; X) - 1 
Log 2 (N c ) Log 2 (N c ) 

H{C)-I{C-X) H{C/X) 



P. < 



Log 2 Log 2 



The expression of conditional entropy H(C/X) is calculated between the ground 
truth map (i.e. the classes C) and the subset of bands candidates X. N c is 
the number of classes. So when the features X have a higher value of mutual 
information with the ground truth map, (is more near to the ground truth 
map), the error probability will be lower. But its difficult to compute this 
conjoint mutual information I(C;X), regarding the high dimensionality [14]. 
Figure. 4. shows the MI between the GT and synthetic bands. The figure .6 
shows the MI between the GT and the real bands of HIS AVIRIS 92AV3C [1]. 
Many studies use a threshold to choice the relevant bands. Guo [3] uses the 
mutual information to select the top ranking band, and a filter based algo- 
rithm to decide if there neighbours are redundant or not. Sarhrouni et al. [17] 
use also a filter strategy based algorithm on MI to select bands. A wrapper 
strategy based algorithm on MI, Sarhrouni et al. [18] is also introduced. 
By a thresholding, for example with a threshold 0.4, see Figure. 5, we eliminate 
the no informative bands: A 3 , A 7 and Ag. With other threshold, we can retain 
fewer bands. We can visually verify this effectiveness of MI to choice relevant 
features in Figure. 4. 



2.2 Symmetric Uncertainty 

This is one of normalized form of Mutual Information; introduced by Witten 
and Frank, 2005 [19]. Its defined as bellow: 

U(A, B) = 2.- 



H(A) + H(B) 

H{X) is the Entropy of set random variable X. Some studies use this U 
for recalling images in medical images treatment [9]. Numerous studies use 
Normalized Mutual Information [20] [21] [22]. 

Figure. 3 shows that symmetric uncertainty means how much information is 
partaged between A and B relatively at all information contained in both A 
and B 
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Figure 3: Illustration of Symetric Uncertainty 

3 Principe of the Proposed Method and Algo- 
rithm 

For this section we synthesize 19 bands from the GT, Figure. 1, by adding 
noise, cutting some substances etc. see Figure. 4. Each band has 145 lines and 
145 columns. The ground truth map is also provided, but only 10366 pixels 
are labelled from 1 to 16. Each label indicates one from 16 classes. The zeros 
indicate pixels how are not classified yet, Figure. 2. We can show the Mutual 
information of GT and the synthetic bands at Figure. 5. 

3.1 Principe to select relevant bands 

With a threshold 0.4 of MI calculated in Figure. 4 we obtain 16 relevant bands 
A: 

with i={l,2,3,4,5,6,8,10,ll, 12,14,15, 16,17,18, 19}. 

We can visually verify the resemblance of GT and the bands more informative, 
bout in synthetic and the real data bands of AVIRIS 92AV3C. See Figure. 6. 

3.2 Principe of no Redundant Bands Detection 

First: We order the remaining bands, in increasing order of there MI with the 
GT. So we have: 

{Ax2A 8 A 15 A 6 A 1 A 3 A u A 16 A 2 A 10 A 17 A4A 1 gA 5 A 11 A 18 } 

Second: We fixe a threshold to control redundancy, here 0.7. Then we 
compute the Symmetric Uncertainty: U(Ai,Aj) for all couple of the en- 
semble: 




Figure 5: Mutual Information of GT and synthetic bands . 



S = {8, 15, 6, 1, 3, 14, 16, 2, 10, 17, 4, 19, 5, 11, 18}. 



Observation 1 : Figure. 4 shows that the band An is practically the same 
at v4 4 . Table I shows U(A 17 ,A4) near to 100% (0.95). So this indicates a high 
redundancy. 



Observation 2 : Figure. 4 shows that the bands A w and Ai$ are practically 



Application of Symmetric Uncertainty 



7 



disjoint, i.e. they are not redundant. Tabled, shows U(A W , A 18 ) =0.07. So 
this indicates no redundancy. So the ensemble of selected bands became SS = 
{16, 18}. Aie, A±s will be discarded from the Table .1. Algorithm 1 shows more 
details. 

Now we can emit this rule: 

Rule: Each band candidate will be added at SS if and only if their Sym- 
metric Uncertainty values with all elements off SS, are less than the thresholds 
(here 0. 7). 



Algorithm 1 shows more detailsimplements this rule. 



Table 1: THE SYMMETRIC UNCERTAINTY OF THE RELEVANT SYN- 
THETIC BANDS. 



Bands indices in ascending order of their MI with the GT 
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4 Application On HIS AVIRIS 92AV3C 

The Agorithm.l implement the proposed method . 

We apply the proposed algorithm on the hyperspectral image AVIRIS 
92AV3C [1], 50% of the labelled pixels are randomly chosen and used in train- 
ing; and the other 50% are used for testing classification [3] [17] [18]. The clas- 
sifier used is the SVM [5] [12] [4]. 
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Algorithm 1 : Band is the HSI. Let Th re i evance the threshold for selecting 
bands more informative, Th re d U ndancy the threshold for redundancy control. 

1) Compute the Mutual Information (MI) of the bands and the Ground 
Truth map. 

2) Make bands in ascending order by there MI value 

3) Cut the bands that have a lower value than the threshold Th re i evance , the 
subset remaining is S. 

4) Initialization: n = length(S),i = 1, D is a bidirectional array values=l; 
/ /any value greater than 1 can be used, it's useful in step 6) 

5) Computation of bidirectional Data D(n,n): 
for 1=1 to n step 1 do 

for j:=i+l to n step 1 do 

D(iJ) = U(Band S (i), Band S (j)); 

II with U(A,B) = £^ 
end for 
end for 

/ /Initialization of the Output of the algorithm 

6) SS = {} ; 

while min(D) < Th redundancy do 

/ / Pick up the argument of the minimum of D 

(x, y) = argmin(D(., .)); 

if V I e SS D(x,l) < Th redundanC y then 

// x is not redundant with the already selected bands 

SS = SS U {x} 
end if 

if V I e SS D(y,l) < Th redundancy then 

// y is not redundant with the already selected bands 
SS = SS U {y} 

end if 

D(x,y) = 1; D(x,y) — 1; // The cells D(x,y) and D(y,x) will not be 
checked as minimum again 
end while 

7) Finish: The final subset SS contains bands according to the the couple of 
thresholds ( Th re i evance) T redundanC y) . 
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4.1 Mutual Information Curve of Bands 

From the remaining subset bands, we must eliminate no informative ones, bay 
thresholding, see the proposed algorithm. Figure. 6 gives the MI of the HSI 
AVIRIS 92AV3C with the ground truth GT. 




Band 170 





Ground Truth map 
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Figure 6: Mutual information of GT and AVIRIS bands. 



4.2 Results 

From the remaining subset bands, we must eliminate redundant ones using the 
proposed algorithm. Table II gives the accuracy off classification for a number 
of bands with several thresholds. 



4.3 Discussion 

Results in Table. II allow us to distinguish six zones of couple values of thresh- 
olds (THJM): 

Zonel: This is practically no control of relevance and no control of redun- 
dancy. So there is no action of the algorithm. 

Zone2: This is a hard selection: a few more relevant and no redundant 
bands are selected. 
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Zone3: This is an interesting zone. We can have easily 80% of classification 
accuracy with about 40 bands. 

Zone4: This is the very important zone; we have the very useful behaviours 
of the algorithm. For example with a few numbers of bands 19 we have clas- 
sification accuracy 80%. 

Zone5: Here we make a hard control of redundancy, but the bands candi- 
dates are more near to the GT, and they my be more redundant. So we cant 
have interesting results. 

Zone6: When we do not control properly the relevance, some bands af- 
fected bay transfer affects may be non redundant, and can be selected, so the 
accuracy of classification is decreasing. 

Partial conclusion: This algorithm is very effectiveness for redundancy and 
relevance control, in feature selection area. 

The most difference of this algorithm regarding previous works is the sepa- 
ration of the tow process: avoiding redundancy and selecting more informative 
bands. Sarhrouni et al. [17] use also a filter strategy based algorithm on MI to 
select bands, and an another wrapper strategy algorithm also based on MI [18], 
Guo[3] used a filter strategy with threshold control redundancy, but in those 
works, the tow process, i.e avoiding redundancy and avoiding no informative 
bands, are made at the same time by the same threshold. 




20 40 60 80 100 120 140 20 40 60 80 1 00 1 20 1 40 20 40 60 80 1 00 1 20 1 40 



Figure 7: In the middle the GT of AVIRIS 92AV3C. In the left: Reconstructed 
Truth map (GT) with the proposed algorithm for TH=0.56 and MI=0.9; the 
accuracy = 84.16 % for only 42 bands. In right the generalization of classifi- 
cation for all Indiana Pine regions. 

Figure. 7. illustrates the reconstruction of the ground truth map GT, for 
a redundancy threshold 0.56 and relevance threshold IM=0.9. The accuracy 
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Table 2: Classification Accuracy for several couples of thresholds (TH,IM) and 
their corresponding number of bands retained. 



MI: Threshold for control the relevence (MI of bands with Ground Truth) 
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N,B : Number of Banbds retained for the couple of threshold (MI.TH) 

ac(%) :The accuracy of classification calculated for the couple of threshold (MI,TH) 



Zone 1 1234 



Zone 2 1234 



Zone 3 
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Zone 5 1234 



Zone 6 1234 



classification is 84.16% for 42 bands selected. The figure. 7 gives also a general 
classification of the entire scene Indiana Pin [1]; the pixels not labelled in GT, 
are here classified. This illustrates the power of generalisation of the proposed 
method. 



We can not here that Hui Wang [15] uses two axioms to characterize fea- 
ture selection. Sufficiency axiom: the subset selected feature must be able to 
reproduce the training simples without losing information. The necessity ax- 
iom " simplest among different alternatives is preferred for prediction" . In the 
algorithm proposed, reducing error uncertainty between the truth map and 
the estimated minimize the information loosed for the samples training and 
also the predicate ones. We not also that we can use the number of features 
selected like condition to stop the search. [16]. 



12 



E. Sarhrouni, A. Hammouch and D. Aboutajdine 



5 Conclusion 

Until recently, in the data mining field, and features selection in high dimen- 
sionality the problematic is always open. Some heuristic methods and algo- 
rithms have to select relevant and no redundant subset features. In this paper 
we introduce an algorithm in order to process separately the relevance and the 
redundancy. We apply our method to classify the region Indiana Pin with the 
Hyperspectral Image AVIRIS 92AV3C. This algorithm is a Filter strategy (i.e. 
with no call to classifier during the selection). In the first step we use mu- 
tual information to pick up relevant bands by thresholding (like most method 
already used) . The second step introduces a new algorithm to measure redun- 
dancy with Symmetric Uncertainty coefficient. We conclude the effectiveness 
of our method and algorithm the select the relevant and no redundant bands. 
This algorithm allows us a wide area of possible fasted applications. But the 
question is always open: no guaranties that the chosen bands are the optimal 
ones; because some redundancy can be important to reinforcement of learning 
classification system. So the thresholds controlling relevance redundancy is a 
very useful tool to calibre the selection, in real time applications. This is a very 
positive point for our algorithm; it can be implemented in a real time applica- 
tion, because in commercial applications, the inexpensive filtering algorithms 
are urgently preferred. 
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